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Abstract. To study quantum computation, it might be helpful to generalize structures from language 
and automata theory to the quantum case. To that end, we propose quantum versions of finite-state 
and push-down automata, and regular and context-free grammars. We find analogs of several classical 
theorems, including pumping lemmas, closure properties, rational and algebraic generating functions, 
and Greibach normal form. We also show that there are quantum context-free languages that are not 
context-free. 

1 Introduction 

Nontraditional models of computation — such as real-valued, analog, spatial, molecular, stochastic, and 
quantum computation — have received a great deal of interest in both physics and computer science in 
recent years (e.g. [1, 4, 10, 21, 8, 31, 9]). This stems partly from a desire to understand computation in 
dynamical systems, such as ordinary differential equations, iterated maps, cellular automata, and recurrent 
neural networks, and partly from a desire to circumvent the fundamental limits on current computingx 
technologies by inventing new computational model classes. 

Quantum computation, in particular, has become a highly active research area. This is driven by the 
recent discovery of quantum algorithms for factoring that operate in polynomial time [29], the suggestion 
that quantum computers can be built using familiar physical systems [7, 14, 19], and the hope that errors 
and decoherence of the quantum state can be suppressed so that such computers can operate for long times 
[30, 33]. 

If we are to understand computation in a quantum context, it might be useful to translate as many 
concepts as possible from classical computation theory into the quantum case. From a practical viewpoint, 
we might as well start with the lowest levels in the computational hierarchy and work upward. In this paper 
we begin in just this way by defining quantum versions of the simplest language classes — the regular and 
context-free languages [16]. 

To do this, we define quantum finite-state and push-down automata (QFAs and QPDAs) as special 
cases of a more general object, a real-time quantum automaton. In this setting a formal language becomes a 
function that assigns quantum probabilities to words. We also define quantum grammars, in which we sum 
over all derivations to find the amplitude of a word. We show that the corresponding languages, generated 
by quantum grammars and recognized by quantum automata, have pleasing properties in analogy to their 
classical counterparts. These properties include pumping lemmas, closure properties, rational and (almost) 
algebraic generating functions, and Greibach normal form. 

For the most part, our proofs simply consist of tracking standard results in the theory of classical languages 
and automata, stochastic automata, and formal power series, and attaching complex amplitudes to the 
transitions and productions of our automata and grammars. In a few places — notably, lemmas 12 and 13 
and theorems 6, 7, 10, 19, 23, and 24 — we introduce genuinely new ideas. 

We believe that this strategy of starting at the lowest rungs of the Chomsky hierarchy has several 
benefits. First, we can make concrete comparisons between classical and quantum computational models. 
This comparison is difficult to make for more powerful models, because of unsolved problems in computer 
science (for instance, deterministic vs. quantum polynomial time). 

Second, studying the computational power of a physical system can give detailed insights into a natural 
system's structure and dynamics. For example, it may be the case that the spatial density of physical 
computation is finite. In this case, every finite quantum computer is actually a QFA. If a system does in fact 
have infinite memory, it makes sense to ask what kinds of long-time correlations it can have, such as whether 
its memory is stack-like or queue-like. Our QPDAs provide a way to formalize these questions. 
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Molecular biology suggests another example along these lines, the class of protein secondary structures 
coded for by RNA. To some approximation the long-range correlations between RNA nucleotide base pairs 
responsible for secondary structure can be modeled by parenthesis-matching grammars [28, 27] . Since RNA 
macromolecules are quantum mechanical objects, constructed by processes that respect atomic and molecular 
quantum physics, the class of secondary structures coded for by RNA may be more appropriately modeled 
by the quantum analogs of context-free grammars introduced here. In the same vein, DNA and RNA nu- 
cleotide sequences are recognized and manipulated by various active molecules (e.g. transcription factors and 
polymerases), could their functioning be modeled by QFAs and QPDAs? 

Finally, the theory of context-free languages has been extremely useful in designing compilers, parsing 
algorithms, and programming languages for classical computers. Is it possible that quantum context-free 
languages can play a similar role in the design of quantum computers and algorithms? 



1.1 Quantum mechanics 

First, we give a brief introduction to quantum mechanics [34]. 

A quantum system's state is described by a vector of complex numbers. The dimension of a quantum 
system is the number of complex numbers in its state vector. A column vector is written \a) and its Hermitian 
conjugate \a)\ the complex conjugate of its transpose, is the row vector (a|. These vectors live in a Hilhert 
space H, which is equipped with an inner product a - b = {a\b). The probability of observing a given state a 
is its norm jaj^ = (aja). 

Over time, the dynamics of a quantum system rotates the state \a) in complex vector space by a unitary 
matrix U — one whose inverse is equal to its Hermitian conjugate, = U~^. Then the total probability of 
the system is conserved, since if (a'j = {a\U, then {a'\a') = {a\WU\a) = (aja). 

The eigenvalues of a unitary matrix are of the form e''^', where lu is a real- valued angle, and so are 
restricted to the unit circle in the complex plane. Thus, the dynamics of an n-dimensional quantum system, 
which is governed by an n x n unitary matrix, is simply a rotation in C". In the Schrodinger equation, U is 
determined by the Hamiltonian or energy operator H via U = e'^*. 

A measurement consists of applying an operator O to a quantum state a. We will write operators on the 
right, {a\0. To correspond to a classical observable, O must be Hermitian^ = O, so that its eigenvalues 
are real and so "measurable". If one of its eigenvalues A is associated with a single eigenvector u\, then we 
observe the outcome O — X with a probability |(a|MA)|^, where {a\u\) is the component of a along u\. 

More generally, if there is more than one eigenvector u\ with the same eigenvalue A, then the probability 
of observing O = X when the system is in state a is jaPAp, where P\ is a projection operator such that 
{Ufj.\Px = {u^\ if /i = A and otherwise. Thus, P\ projects a onto the subspace of H spanned by the u\. 

For instance, suppose that we consider a two-dimensional quantum system with Hamiltonian 7i = 

y Then ^ ^ ^'^ q -''f^' '^^^'^ eigenvectors of Ti. are and ^ with eigenvalues +1 and —1, 

respectively. If the system is in the state (a| — (\/3/2, — i/2), a measurement of the energy Ti, will yield +1 or 
— 1 with probabilities 3/4 and 1/4, respectively. The projection operators are P+i — f ^ ^ j and P_i = ( g ^ 



1.2 Classical finite automata and regular languages 

Readers familiar with basic automata theory should skip this section and the next two. An introduction can 
be found in [16]. 

If A is an alphabet or set of symbols, A* is the set of all finite sequences or words over A and a language 

L over A is a subset of A*. If w is a word, then \w\ is its length and Wi is its i'th symbol. We denote the 
empty word by e, the concatenation of two words u and v as uv, and w repeated k times as w'^. 

A deterministic finite-state automaton (DFA) consists of a finite set of states S, an input alphabet A, a 
transition fimction F : S x A ^ S, an initial state sinit G S, and a set of accepting states S'acccpt C S. The 
machine starts in Si„it and reads an input word w from left to right. At the ith step, it reads a symbol Wi 
and updates its state to s' = F{s,Wi). It accepts w if the final state reached after reading w\yj\ is in S'accept- 
We say the machine recognizes the language of accepted words. 
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A nondeterministic finite-state automaton (NFA) has a transition function into the power set of A, 
F : S X A ^ 'P{A), so that there may be several transitions the machine can make for each symbol. An NFA 
accepts if there is an allowed computation path, i.e. a series of allowed transitions, that leads to a state in 

'^'accept • 

As it turns out, DFAs and NFAs recognize exactly the same languages, since an NFA with a set of states 
S can be simulated by a DFA whose states correspond to subsets of S. If a language can be recognized by a 
DFA or NFA, it is called regular. 

For instance, the set of words over A = {a, b} where no two 6's occur consecutively is regular. If 5 = 
{A, B, R}, Sinit = A, ^accept = {A, B} , and 



F{A,a)=F{B,a)=A F{A,b) = B 
F{B, b) = R F{R, a) = F{R, b) 



R 



then we enter the 'reject' state R, and stay there, whenever we encounter the string bb. A, S, s-mit, 5'accept, 
and F constitute a DFA. 

One way to view finite-state automata is with matrices and vectors. If an NFA has n states, the set of 
allowed transitions can be described by an n x n transition matrix Ma for each symbol a G A, in which 
{Ma)ij = 1 if and only if the transition from state i to state j is allowed on reading a. Then if Sinit is the 
n-component column vector 

(Sinit)i 







otherwise 



and Paccept is the column vector 



(P 



accept Ji 



otherwise 



then the number of accepting paths on an input w is 



accept 



(1) 



where Mw is shorthand for Mj^^M^^ • • • Mw^^^ ■ Then a word w is accepted if f{w) > 0, so that there is some 
path leading from Sinit to the accepting subspace spanned by s e /^accept- (We apply the matrices on the 
right, so that they occur in the same order as the symbols of w, instead of in reverse.) Of course, is the 
identity matrix, which we will denote 1. 

Equation (1) will be our starting point for defining quantum versions of finite-state automata and regular 
languages. 



1.3 Push-down automata and context-free languages 

A push-down automaton (PDA) is a finite-state automaton or 'control' that also has access to a stack, an 
infinite memory storing a string of symbols in some alphabet T. Its transition function F : S x T x A 
V{S xT*) allows it to examine its control state, the top stack symbol, and the input symbol. It then updates 
its control state, pops the top symbol ofl" the stack, and pushes a (possibly empty) word onto the stack. A 
PDA starts with an initial state and stack configuration. After reading a word, it accepts if a computation 
path exists that either ends in an accepting control state or produces an empty stack. 

PDAs recognize the context-free languages (CFLs), a name whose motivation will become clear in a 
moment. For instance, the Dyck language of properly nested words of brackets {e, (), (()), ()(), (()()), . . .} is 
context-free. It is recognized by a PDA with a single stack symbol x. This PDA pushes an x onto the stack 
when it sees a "(" and pops one off when it sees a ")". If it ever attempts to pop a symbol off an empty 
stack, it enters the reject state and stays there. 

A deterministic push-down automaton (DPDA) is one with at most one allowed transition for each 
combination of control state, stack symbol, and input symbol. DPDAs recognize the deterministic context- 
free languages (DCFLs), such as the Dyck language above. 
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1.4 Grammars, context-free and regular 

A grammar consists of two alphabets V and T, the variables and terminals, an initial variable I € V, and a 
set P of productions a — > /3 where a gV* and /3 e (V U T)*. A derivation a=> (3 is a. chain of strings, where 
at each step one siibstring is replaced with another according to one of the productions. Then the language 
generated by the grammar consists of those strings in T* (consisting only of terminals) that can be derived 
from I with a chain of productions in P. 

For example, the grammar V = {/}, T = {(,)}, and P = {/ — ^ ^ ^ ^} generates the Dyck 

language. Note that the left-hand side of each production consists of a single symbol and does not require 
any neighboring symbols to be present; hence the term context-free. Context-free grammars generate exactly 
the languages recognized by PDAs. 

The Dyck language grammar is unambiguous in that every word has a unique derivation tree. A context- 
free language is unambiguous if there is an unambiguous grammar that generates it. Notably, there are 
inherently ambiguous context-free languages for which no unambiguous grammar exists. 

If we restrict a grammar further so that every production is of the form vi wv2 or vi w, where 
w G: T* and Vi,V2 € V, then there is never more than one variable present in the string. The result is that a 
derivation leaves strings of terminals behind the variable as it moves to the right. Such grammars are called 
regular and generate exactly the regular languages. 

1.5 Quantum languages and automata 

Since quantum systems predict observables in a probabilistic way, it makes sense to define a quantum language 
as a function mapping words to probabilities, f : A* [0, 1]. This generalizes the classical Boolean situation 
where each language has a characteristic function xl '■ A* ^ {0,1}, defined as xl{w) = 1 ii w & L and 
otherwise. (In fact, in order to compare our quantum language classes with the classical ones, we will 
occasionally abuse our terminology by identifying a Boolean language with its characteristic function, saying 
that a language is in a given class if its characteristic function is.) 

Then in analogy to equation (1), we define quantum automata in the following way: 

Definition. A real-time quantum automaton (QA) Q consists of 

- a Hilbert space H, 

- an initial state vector Sinit € H with |sinitP = 1, 

- a subspace -ffaccopt C H and an operator Paccept that projects onto it, 

- an input alphabet A, and 

- a unitary transition matrix Ua for each symbol a G A. 
Then using the shorthand 

we define the quantum language recognized by Q as the function 

/q(^) = I Sinit fAu -Paccept I ^ 

from words in A* to probabilities in [0, 1]. (Again, we apply linear operators on the right, so that the symbols 
Wi occur in left- to-right order.) 

In other words, we start with (sinit|. apply the unitary matrices U^i for the symbols of w in order, and 
measure the probability that the resulting state is in i?accept by applying the projection operator Paccept and 
measuring the norm. This is a real-time automaton since it takes exactly one step per input symbol, with 
no additional computation time after the word is input. 

Physically, this can be interpreted as follows. We have a quantum system prepared in a superposition of 
initial states. We expose it over time to diff'erent influences depending on the input symbols, one time-step 
per symbol. At the end of this process, we perform a measurement on the system and f{w) is the probability 
of this measurement having an acceptable outcome, such as being in a given energy level. 

Note that / is not a measure on the space of words. It is the probability of a particular measurement 
after a given input. 
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This basic setting is not new. If we restrict ourselves to real rather than complex values and replace 
unitarity of the transition matrices with stochasticity in which the elements of each row of the Ua sum to 1, 
we get the stochastic automata of Rabin [24]; see also the review in [20]. If we generalize the Ua to nonlinear 
maps in R", we get real-time dynamical recognizers [22]. If we generalize the Ua to nonlinear Bayes-optimal 
update maps of the n-simplex, we get e-machine deterministic representations of recurrent hidden Markov 
models [8, 36]. 

Note that the effect of the matrix product Uw = Uwi Uw^ • ■ ■ is to sum over all possible paths that 
the machine can take. Each path has a complex amplitude equal to the product of the amplitudes of the 
transitions at each step. Each of U^s components, representing possible paths from an initial state sq to a 
final state s\yj\, is the sum of these. That is, 

(t^Ul)so,S|„| = ^ ^ (^«'l)so,Sl (fA(;2)si,S2 ■ ■ ■ (fA^I^I )s|„|_i,S|„| 

S1,S2,...,S|to|_1 

over all possible choices of the intervening states si, . . . , S|^|_i. The difference from the real-valued (stochas- 
tic) case is that destructive interference can take place. Two paths can have opposite phases in the com- 
plex plane and cancel each other out, leaving a total probability less than the sum of the two, since 
|a + &P< |a|2 + |6|2. 

Note that paths ending in different perpendicular states in Hacccpt add noninterferingly, |ap + while 
paths ending in the same state add interferingly, |a-t-6p. This will come up several times in discussion below. 

In analogy with Turakainen's generalized stochastic automata [35] where the transition matrices do not 
necessarily preserve probability, we will sometimes find it useful to relax unitarity: 

Definition. A generalized real-time quantum automaton is one in which the matrices Ua are not necessarily 
unitary and the norm of the initial state Sinit is not necessarily 1. 

We can then define different classes of quantum automata by restricting the Hilbcrt space H and the 
transition matrices Ua in various ways; first to the finite-dimensional case and then to an infinite memory in 
the form of a stack. 

2 Quantum finite-state automata and regular languages 

The quantum analog of a finite-state machine is a system with a finite-dimensional state space, so 

Definition. A quantum finite-state automaton (QFA) is a real-time quantum automaton where H, Sjnit, 
and the Ua all have a finite dimensionality n. A quantum regular language (QRL) is a quantum language 
recognized by a QFA. 

In this section, we will try to reproduce as many results as possible on classical regular languages in the 
quantum case. 

2.1 Closure properties of QRLs 

First, we define two operations on quantum automata that allow us to add and multiply quantum languages. 
The result is that the set of QRLs is closed under these operations, just as stochastic languages are [20, 23]. 

Definition. If u and v are vectors of dimension m and n, respectively, their direct sum u ® w is the (m -|- n)- 

dimensional vector {ui, . . . , Um, vi, ■ ■ ■ , Vn)- If M and N are matrices, then M (B N = 

Then if Q and R are quantum automata with the same input alphabet, and if a and b are complex 
numbers such that |ap -|- |&p = 1, the weighted direct sum aQ © bR has initial state s|„ij = as^j^. © bsf^^^, 
projection operator Precept = ^a^cept ® -Pa^cept- and transition matrices U'^ = U^® U^. 

Lemmal. IfQ and R are QFAs andif\a\'^ + \b\'^ = I, then aQ®bR is a QFA and faQ®bR = + 
Therefore, if fi, f2, ■ ■ ■ , fk (f^e QRLs, then Yli=o^ifi ^ QRL for any real constants Cj > such that 



M 



N 
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Proof. Clearly |s|jj;jp = lasj^j^p + l&Sj^^jjp = |ap + = 1. The direct sum of two subspaces is a subspace, 
the direct sum of unitary matrices is unitary, and the direct sum of two finite-dimensional quantum automata 
is finite-dimensional, so aQ ® bR is a QFA. 
Furthermore, U'^ = U2 ® and 

(Note that the phases of a and b don't matter, only their norms.) By induction we can sum any k QRLs in 
this way, as long as J2i=o = 1- n 

Definition. If u and v are vectors of dimension m and n, respectively, then their tensor product v is 

the mn-dimcnsional vector = UiVj where ~ n{i — 1) + j, say, is a pairing function. If M and N 

are m x m and n x n matrices, M ig) iV is the mn x mn matrix Oi^i^k^^i^j^i'^ = MijNki. Then if Q and R are 
quantum automata with the same input alphabet, Q ^ R is defined by taking the tensor products of their 
respective Sinit, -Paccept, and the Ua- 

Lemma 2. // Q and R are QFAs, then Q ® R is a QFA and Jq^r = Jq/r- Therefore, the product of any 

number of QRLs is a QRL. 

Proof. It is easy to show that if a and c are m-dimensional vectors and b and d are n-dimensional vectors, 
then (o06|c(8)d) = {a\c}{b\d). Therefore, |Si„itP = |sSitl^ kSitP — The tensor product of finite-dimensional 
unitary matrices is unitary and finite-dimensional, so Q (S) i? is a QFA. 
Furthermore, U'^ = (g) and 

.fQ^Riw) = \s9itUSPlopt\' ■ \4utU^Plept\^ = fQ^fuH 

By induction we can multiply any number of QRLs in this way. □ 
Lemma 3. For any c € [0, 1], the constant function f{w) = c is a QRL. 
Proof. Just choose any Sinit and 

-facccpt such that |Sinit-faccept 

I = c, and let [/« = 1 for all a. □ 

Since we can add and multiply QRLs, we have 

Corollary. Let fi be QRLs and let Cj be a set of constants such that 2^j_o Cj < 1. Then any polynomial 
X^j" '^jSj- where each gj is a product of a finite number of /^'s, is a QRL. 

In a sense, closure under (weighted) addition and multiplication are complex-valued analogs of OR and 
AND. Classical regular languages are closed under both these Boolean operations, as well as complementation: 

Lemma 4. // / is a QRL, then f = 1 — f is a QRL. 

Proof. Let ^f^cept be the subspace of H perpendicular to /^accept and P^cept the projection operator onto 
it. Since Paccept + ^accept = 1> -Paccept-Piccept = 0, the are unitary, and |si„it|^ = 1, we have 

1 = |SinitfAi)| = I Sinit fAi) (-Paccept + -Paccept)! 
— I Sinit ^ti; -Paccept I ~t~ l^init^iu-Pacceptl 

= f{w)+J{w) 

where J{w) = |sinitC^«,-PacceptP- n 
Another property of classical regular languages is closure under inverse homomorphism [16]: 

Definition. A homomorphism h : A* ^ A* is a function that replaces symbols with words. For instance, if 
h{a) = b and h{b) = ab, then h{bab) = abbab. If / is a quantum language, then its inverse image under h is 
the language (/ o h){w) = f{h{w)). (This looks wrong, but it is in fact the proper form for the characteristic 
function of the inverse image of a set. Formally, the mapping from sets to characteristic functions acts like 

a contravariant functor.) 

Lemma 5. // / is a QRL and h is a homomorphism, then the inverse image f o h is a QRL. 

Proof. Simply replace each Ua with U^a)- Recall that the composition of unitary matrices is unitary. □ 
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2.2 The pumping lemma for QRLs 

The following is a well-known classical result [16]: 

Lemma (Pumping Lemma for Regular Languages) . If L is a regular language, then any sufficiently 

long word w E L can be written w = xyz such that xy^z G L for all k > 0. 

Proof. If an NFA has n states, then any path longer than n transitions contains a loop, which can be repeated 
as many times as desired. □ 

Because of unitarity, we have a slightly stronger result for QRLs in that any subword can be 'pumped'. 
However, unlike the classical case, we can't repeat a word arbitrarily many times. Rather, the dynamics is 
like an irrational rotation of a circle, so that for any e > 0, there is some k such that k rotations brings us 
back to within a distance e from where we started. 

Theorem 6 (Pumping for QRLs). // / is a QRL, then for any word w and any e > 0, there is a k such 
that \f{u'w''v) — f{uv)\ < e for any words u,v. Moreover, if f 's automaton is n-dimensional, there is a 
constant c such that k < (ce)"". 

Proof. In its diagonal basis, U^, rotates n complex numbers on the unit circle by n different angles for 
1 < i < n. We can think of this as a rotation of a n-dimensional torus. If y = (ce)" is the volume of a 
n-dimensional ball of radius e, then [/,^ is within a distance e of the identity matrix for some number of 
iterations k < l/V. We illustrate this in figure 1. 

Then we can write = 1 + eJ, where J is a diagonal matrix for which J2^=o I'^al'^ — ^' 

f{uw''v) = |Si„itC/„(l + eJ)[/„Paccept|^ = f{uv) + e\SinitUy,JUyPs,ccept\'^ 

Since 

n 

the theorem is proved. □ 

If m of the angles are rational fractions 2'jTp/q, then we return to a (n — m)-dimensional torus every 
q steps and k < q{ce)~^"~"^h 




Fig. 1. Iterating the unitary matrix Uw is equivalent to rotating a torus. If a ball of radius e has volume V, then 
after at most l/V iterations the state must return to within a distance e of its initial position. 



In the case where a unitary QFA recognizes a classical language (which we identify with its characteristic 
function), this gives the following: 
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Theorem 7. If a regular language L is a QRL, then the transition matrices Ma of the minimal DFA recog- 
nizing L generate a group {Myj}. Therefore, there are regular languages that are not QRLs. 

Proof. Any set of matrices forms a semigroup, so we just have to show that every sequence of transitions 

Mw has an inverse. 

Define two words as equivalent, u ^ v, if they can be followed by the same suffixes, uw G L if and only if 
vw S L. It is well-known [16] that the states of L's minimal DFA are in one-to-one correspondence with ~'s 
equivalence classes. 

Then if L's characteristic function xl is a QRL, setting e < 1 in theorem 6 shows that for every w, there 
exists a k such that, for all u and v, 

Xl{uw''v) = Xl{uv) 

which implies uw'^ ^ u for all u. Then = 1 in L's minimal DFA since it returns any u to its original 
equivalence class, and has an inverse M^~^. So {M^} is a group. 

Most regular languages don't have this property. Consider the language L given in the introduction with 
the subword bb forbidden. Inserting bb anywhere in an allowed word makes it disallowed, and this cannot be 
undone by following bb with any other subword. Thus M^b has no inverse in {M^,}, and L is not a QRL. □ 

In contrast, in the generalized case where the Ua don't have to be unitary, we have 

Lemma 8. Any regular language is a generalized QRL. 

Proof. Let the Ua be the Boolean transition matrices of L's DFA. Then there is exactly one allowed path 
for each allowed word, so f{w) = xl{w). □ 

Combining this with the previous corollary gives the following: 

Corollary. The QRLs are a proper subclass of the generalized QRLs. 



2.3 QRLs cire rational 

In classical language theory, we are often interested in the generating function of a language, gL{z) = 
Su'GL cquivalcntly N^z^, where iV„ is the number of words of length n in L. More generally, if 

we think of the symbols a € ^ as noncommuting variables, we can write a formal power series Gl = ^yj^j^ w, 
whereupon setting a = 2; for all a e ^ gives Gl = gL{z). 

A beautiful theory of such scries is given in [18]. In particular, the generating function of a regular 
language is always rational, i.e. the quotient of two polynomials. To see this, sum equation (1) over all 
lengths, labelling transitions with their respective symbols. Using a DFA with one computation path per 
word, if we define M = '^aeA ^-^a and rewrite the sum over all words as a sum over all lengths, we have 



^ accept 

w 

00 

= Sinit ' ' -^accept 



• (1 - M)-i . P 



accept 



which is rational in each symbol a since each component of (1 — M) ^ is. Then restricting to a = 2; for all a 
gives a rational gL{z) as well. 

For instance, for the regular language given above with bb forbidden, M = ( ^ q ) ) Sinit = 




Paccept = 1- Here I ^ ) represents the reject state. Then the reader can check that 



(1-M)- 



1 b 
1-a-ab \al-a 
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Gl = r = l+a + h + aa + ah + ba-\ 

1 — a — ao 

where the empty word is now denoted by 1. Setting a = h = z gives 

9l{z) = , o = 1 + 2z + + + . . . 

1 — z — z-^ 

recovering the well-known fact that the number of words of length n is the n'th Fibonacci number. 

The obvious generalization of this is 

Definition. If / is a quantum language, then its generating function G/ is the formal sum X^^^^. f{w) w. 
Theorem 9. If f is a generalized QRL, then Gf is rational. 

Proof. We first consider generating functions g based on complex amplitudes rather than total probabilities. 
The accepting subspace iJaccept is spanned by a finite number of perpendicular unit vectors hi. Then if we 
define gi = J2wi^init\Uw\hi) w and U = Y^aeA^^a, we have 

gi = (sinit I (1 - C/)-' I hi) 

and the gi are rational. 

The Hadamard product of two series C = ^^u, c^w and D = J2w '^^w is the series formed by multiplying 
their coefficients torm-by-tcrm, C Q D — c^d^w. Since |wPacccptP = X^i K"*^!^*)!^ ^'^^ vector w, i.e. 
the probability of being in i^accept is the (noninterfering) sum of the squares of the amplitudes along each of 
the hi, we have 

i 

The class of rational series is closed under both addition and Hadamard product [18], so Gf is rational. 
(These closure properties are generalizations of the closure of the class of regular languages under union and 
intersection.) □ 

The theory of rational generating functions has also been used in the recognition of languages by neural 
networks [32]. 



2.4 Real representation and stochastic automata 

We should investigate the relationship between quantum and real-valued stochastic automata, since the 
latter have been extensively studied. We alluded to the following in the introduction [23, 35] : 

Definition. A generalized stochastic function is a function from words over an alphabet A to real numbers, 
/ : A* — > M, for which there are real-valued vectors tt and r] and real-valued matrices Ma for each a & A 
such that / is a bilinear form, 

f{w) = TT^ • • 7/ 

where = Myj^M^^ ■ ■ ■ as before. We will call such a function n- dimensional if tt, t] and the Ma are 

n-dimensional. 

If the components of rj are and 1 denoting nonaccepting and accepting states and if tt and the rows 
of the Ma have non-negative entries that sum to 1 so that probability is preserved, then / is a stochastic 
function. If we allow negative entries but still require that tt and the rows of the Ma sum to 1, then / is 
pseudo-stochastic. 

It is well known that complex numbers c = a + bi can be represented by 2 x 2 real matrices c = 

The reader can check that multiplication is faithfully reproduced and that c^c = jcj^l. In the same way, an 
n X n complex matrix can be simulated by a 2n x 2n real-valued matrix. Moreover, this matrix is unitary if 
the original matrix is. 

Using this representation, we can show the following: 
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Theorem 10. Any generalized QRL recognized by an n- dimensional generalized QFA is a 2n^ -dimensional 
generalized stochastic function. 

Proof. First we transform our automaton so that the output /(w) is a bihnear, rather than quadratic, 
function of the machine's state. As before, let hi be a set of perpendicular unit vectors spanning iJaccept- 
Then 

n 

f{w)=J2\{Sinit\U^\hi)\^ 

i=0 
n 

= Y^isLit ® Sinit \Ul®U^\ h* ® hi) 
i=0 

n 
1=0 

This has the form tt^ • ■ i] with tt = s*^;^ (g) Sinn, Ma = U*®Ua for all a & A, and rj = ^^h* ® hi. Since 
these are the tensor products of n-dimensional objects, they have dimensions. However, their entries are 
still complex-valued. 

Using the representation above, we transform tt^. Ma, and r] into 2 x 2n^, 2n^ x 2n^, and x 2 real- 
valued matrices Tf-^, M^, and rj, respectively, and 7f^ • My, ■ rj = f{w) ^ . Letting tt and r] be the top row 
of 7f and the left column of rJ, respectively, gives the desired real-valued, bilinear form. □ 

This expression of a QRL as a generalized stochastic function gives us transition matrices that are 
unitary but neither stochastic nor pseudo-stochastic. A logical question, then, is whether the class of QRLs 
is contained in the class of stochastic functions, or vice versa, and similarly for the pseudo-stochastic functions. 
Since the only matrices that are both pseudo-stochastic and unitary are permutation matrices, it seems more 
likely that the QRLs arc incomparable with both these classes. In that case, their intersection would be the 
stochastic quantum regular languages (SQuRLs) [25]. 

If a generalized stochastic function / is the characteristic function of some language L, then L can be 
defined as L = {w \ f{w) > 0}. Turakaincn [35] showed that / can be replaced with a stochastic bmction, in 
which case L is a 0-stochastic language. Bukharaev [5] has shown that any such language is regular, so we 
have a converse to lemma 8: 

Corollary. If the characteristic function of a language L is a generalized QRL, then L is regular. 



3 Quantum context-free languages 
3.1 Quantum push-down automata (QPDAs) 

Next, we define quantum push-down automata and show that several modifications to the definition result 
in equivalent machines. 

Definition. A quantum, push-down automaton (QPDA) is a real-time quantum automaton where H is the 
tensor product of a finite-dimensional space Q, which we will call the control state, and an infinite-dimensional 
stack space S, each basis vector of which corresponds to a finite word over a stack alphabet T. We also require 
that Sinit, which is now infinite-dimensional, be a superposition of a finite number of different initial control 
and stack states. 

Because of the last-in, first-out structure of a stack, only certain transitions can occur. If gi , (72 S Q are 
control states and (Ti,(T2 <E T* are stack states, then the transition amplitude {{qi,(yi)\Ua\{q2,(^2)) can be 
nonzero only if toi = 02, oi = tu2, or ui = 02 for some t G T. In other words, transitions can only push 
or pop single symbols on or off the stack or leave the stack unchanged. Furthermore, transition amplitudes 
can depend on the control state and the stack, but only on the top (leftmost) symbol of a\ and (72, or on 
whether or not the stack is empty. 

Finally, for acceptance we demand that the QPDA end in both an accepting control state and with an 
empty stack. That is, i?accept = <3accept ® {e} for some subspace Qaccept C Q. 



Quantum Automata and Quantum Grammars 



11 



This definition differs in several ways from that of classical PDAs [16]. First of all, the amplitude of a 
popping transition can depend both on the top stack symbol and the one below it, since the one below 
it is the top symbol of the stack we're making a transition to. We do this for the sake of unitarity and 
time-symmetry, since the amplitude of a pushing transition depends on both the top symbol and the symbol 
pushed. Similarly, popping transition amplitudes can depend on whether the stack will be empty afterwards. 

In the generalized case where the transition matrices are not constrained to be unitary, we can easily get 
rid of this dependence: 

Lemma 11. A generalized QPDA can be simulated by a generalized QPDA whose transition amplitudes do 
not depend on the second-topmost stack symbol. 

Proof. Simply expand the stack alphabet to T' = T U T^. Let each stack symbol also inform the QPDA of 
the symbol below it or that it is the bottom symbol. For instance, the stack stu becomes (s, t) {t, u) u. □ 

However, we believe lemma 11 holds only in the generalized case. While the machine's dynamic is still 

unitary on the subset of the stack space that we will actually visit, we see no way to extend it to the entire 
stack space, including nonsense stacks like {s,t) {u,w), in a unitary, time-symmetric way. 

Again, for time-symmetry's sake, since we can only pop one symbol at a time, we only allow ourselves 
to push one symbol at a time. We next show that allowing us to push words of arbitrary length adds no 
additional power, just as for classical PDAs, at least in the generalized case: 

Lemma 12. A generalized QPDA that is allowed to push words of arbitrary length on the stack can be 
simulated by a generalized QPDA as defined above, for which every move pushes or pops one symbol or 
leaves the stack unchanged. 

Proof. In the classical case, we can do this simply by adding extra control states that push the word on one 
symbol at a time (lemma 10.1 of [16]). However, this allows several steps per input symbol and thus violates 
our real-time restriction, so we need a slightly more subtle construction. 

Suppose the old QPDA pushes words 7 of length at most k. Then we expand the stack alphabet to 
composite symbols T' = x {1, . . . , A;}, which we will denote {(3,rn), and expand the set of control states 
to Q' = Q X {1, . . . , k}, which we will denote (g, mo). 

We represent the old QPDA's stack as shown in figure 2. If the stack of the new QPDA is mi)(/?2, ^2) ■ ■ ■ {/3s,ms), 
then each (3i represents a chunk of the old QPDA's stack, starting with /3;'s mj_i'th symbol. Alternately, 
each rui is a pointer telling us to skip to the mj'th symbol of Pi+i. The pointer mo to /3i is stored in the 
control state. 



new state & stack: 



q,2 




s u s t , 3 


U t V u , I 


f s V u , 4 




1 


t L 


J 



old state & stack: 



Fig. 2. Simulating a QPDA that can push words of length < 4 on the stack with one that only pushes or pops single 
symbols. The counter mi in each stack symbol [Pi, mi) acts as a pointer to the first relevant symbol in Pi+i- The 
pointer for i9i is stored in the control state. The symbols to the left of each pointer are either dummies or symbols 
that have been popped off the original QPDA's stack. 



Using lemma 11, we assume that the old QPDA's transition amplitudes depend only on its top stack 
symbol. We operate the new QPDA as follows, replacing the transitions of the old QPDA with new ones of 
the same amplitude: 

- To pop the top symbol, i.e. the mo'th symbol of change the control state by incrementing mo- If 
Too = fc, pop (/3i, TOi) off the stack and set TOq = TOi in the control state. 

- To push a nonempty word 7 of length n < k, choose a dummy symbol a and push (a''~"7, mo) on the 
stack, padding 7 out to length k. Then set mo = A; — n -|- 1 in the control state. 
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This converts a QPDA into one where each transition pushes or pops one symbol, or changes the topmost 
symbol of the stack by popping when mo = k and then pushing a nonempty 7. 

This simulation preserves our real-time restriction, and creates a QPDA which pushes or pops one symbol, 
or changes the top symbol, at each step. To complete the proof, we need to convert this QPDA into one that 
pushes, pops, or leaves the stack unchanged. This can be done by making the top symbol part of the control 
state, Q" = Q' X T', so that we can change the top symbol by changing the state instead (as in lemma 10.2 
of [16]). □ 

Like lemma 11, we believe lemma 12 holds only in the generalized case. Unitarity appears to be lost even 
on the set of stacks actually visited. The stack state of the old QPDA is represented by many stack states of 
the new QPDA, depending on the intervening computation, and some of these receive less probability than 
others. 

In the classical case, acceptance by control state and by empty stack are equivalent. We can prove this 
in one direction, in both the unitary and generalized case: 

Lemma 13. If a quantum language is accepted by a (generalized) QPDA by empty stack, then it is accepted 
by a (generalized) QPDA by control state. 

Proof. The standard construction (theorem 5.1 of [16]) simply allows the PDA to empty its stack at the end 
of its computation, without reading any additional input. Since this violates our real-time restriction of one 
step per input symbol, we use a slightly different construction that also preserves unitarity. 

First, double the number of control states to Q' = Q (B Q, with a marked control state q E Q for each 
state q E Q. Marked control states will denote an empty stack. Then replace transitions of the old QPDA, 
that pop to or push on an empty stack, with new transitions, with the same amplitudes, as follows: 

- Replace pops of the form {qi,t) (92, e) with {qi,t) (92;^) 

- Replace pushes of the form {qi,e) {q2,t) with (Qijc) — * (92, i) 

Require all states (g, e) (an unmarked control state and an empty stack) and (g, a) (a marked control state 
and a nonempty stack) to make transitions only to themselves with amplitude 1. Finally, let sinit have 
nonzero components only along states {q, e) that are marked and empty and {q, a) that are unmarked and 
nonempty. 

Then the new QPDA will be in a marked control state if and only if the stack is empty, so we accept with 
-^accept = Qaccept ® ^- ^he new transition matrices are direct sums of the old ones (with the basis vectors 
(g, e) replaced by {q, e)) with an identity matrix (on the space generated by the {q, e) and {q, a)). Thus if the 
old QPDA is unitary, the new one is too. □ 

Unfortunately, we believe that a QPDA accepting by control state without regard to the stack cannot 
in general be simulated by one accepting by empty stack. The accepting subspace -ffaccept = Qaccept (8) ^ is 
infinite-dimensional, allowing for an infinite number of different paths that add in a noninterfering way. We 
see no way to map this into a finite-dimensional subspace of the form Qaccept ® {e}. Perhaps the reader can 
find a proof of this. 

The last difference between QPDAs and classical PDAs is that, depending on its precise definition, a 

classical PDA either halts and accepts as soon as its stack becomes empty or rejects if it is asked to pop 
off an empty stack. In our case, we allow a QPDA to sense whether the stack is empty and act accordingly. 
We do this because of our strict real-time constraint, in which the only time the QPDA is allowed to talk 
back to us is when we perform a measurement at the end of the input process. Therefore, we have to tell the 
machine what to do if its stack is already empty and it receives more input. 

3.2 Quantum context-free grammars 

We now propose a definition of quantum grammars, in which each production has a set of complex amplitudes 
and multiple derivations of a word can interfere with each other constructively or destructively. We show 
that in the context-free case, these grammars generate exactly the languages recognized by quantum PDAs. 
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Definition. A quantum grammar G consists of two alphabets V and T, the variables and terminals, an 
initial variable I & V, and a finite set P of productions a ^ (3, where a € V* and /3 G {V U T)* . Each 
production in P has a set of complex amplitudes Ck{a (3) ior 1 < k < n, where n is the dimensionality of 
the grammar. 

We define the fc'th amplitude Ck of a derivation a /3 as the product of the Cfe's for each productions 
in the chain and Ck{a =4> /?) as the sum of the Cfe's of all derivations of /3 from a. Then the amplitudes of 
a word w G T* are Ck{w) = Ck{I ^ w) and the probability associated with w is the norm of its vector of 
amplitudes, summed over each dimension of the grammar, /(w) = J2k=i \^k{'w)\'^- We say G generates the 
quantum language /. 

Finally, a quantum grammar is context-free if only productions where a is a single variable v have 
nonzero amplitudes. A quantum context-free language (QCFL) is one generated by some quantum context- 
free grammar. 

The main result of this section is that a quantum language is context-free if and only if it is recognized 
by a generalized QPDA. We prove this with a series of lemmas that track the standard proof almost exactly. 
Our only innovation is attaching complex amplitudes to the productions and transitions, and showing that 
they match. A similar proof in the real- valued case is given for probahilistic tree automMa in [12]. 

The multiple amplitudes Ck attached to each production seem rather awkward. As we will see below, they 
are needed so that paths ending in perpendicular states in Qaccept can add in a noninterfering way. If we had 
only one amplitude, then all paths would interfere with each other. In the grammars we actually construct, 
except for a few productions, the c^'s for most will be equal. 

Definition. Two quantum grammars Gi and G2 are equivalent if they generate the same quantum language, 
fi{w) = f2{w) for all w. 

Definition. A quantum context-free grammar is in Greibach normal form if only productions of the form 
V ^ a'y where a G T and 7 e can have nonzero amplitudes, i.e. every product /? consists of a terminal 
followed by a (possibly empty) string of variables. 

Lemma 14. Any quantum context-free grammar is equivalent to one in Greibach normal form. 

Proof. This is essentially the same proof as in [12] for the real-valued case. 

Clearly G' is equivalent to G if for each derivation in G of a terminal word, there is exactly one derivation 
in G' with the same set of amplitudes. Then summing the amplitudes over all derivations will give the same 
answer for both grammars. All we need to do, then, is to attach amplitudes to the standard proof for classical 
grammars (lemmas 4.1-4.4 and theorems 4.1 4.6 of [16]) and show that they are carried through correctly. 
As shorthand, we will refer to Ck and for all k as simply c and c', respectively. 

First, theorem 4.4 of [16] shows how to eliminate unit productions of one variable by another, vi V2. If 
G has such productions, then for every production t;j — > /3 in G where /? is not a single variable, give G' the 
productions 

c'{vi f}) = c{vi ^ /3) = ^ c{vi Vj) c{vj j3) 

j 

for all i, where 

00 

c{vi^v^) = Y,{M^),i = {l-M)Tl 

ri=0 

sums over all paths from Vi to Vj with n unit productions, and = c{vi Vj). Then setting c'{vi Vj) ~Q 
leaves G' with no unit productions. 

Second, theorem 4.5 of [16] converts a grammar to Chomsky normal form, in which (3 consists of either a 
single terminal or two variables. For any production u — > /3 in G where /3 consists of m variables 6162 • • • &m, 
introduce additional variables di,d2,. ■ ■ dm-2 and allow the productions v — > bidi, di — > 62C^2, • • • , dm-2 — > 
bm-ibm in G'. Then give G' the productions 

m— 3 

c'{v ^ /3) = c'{v bidi) ■ c'{di bi+idi+i) ■ c {dm-2 —> bm-ibm) 
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which we can make equal to c{v — * fJ) by choosing the c' on the right-hand site appropriately, e.g. with 
c'{v bidi) = c{v (5) and the others set to 1. 

Finally, lemma 4.4 of [16] eliminates productions of the form v — > va. If G has such productions and f's 
other productions in G are w — > /3, add a variable b and give G' the productions 

c'(6 a) = c'{b ^ ab) = c(v va) 
c'{v ^P)= c'{v /3b) = c{v 13) 

for all a and /?. Then 

m— 1 

c'{v => paia2 ■ ■ ■ am) = c'{v I3b) ■ JJ c'{b aib) ■ c'{b am) 

i=l 
m 

= c(w ^ /?) • JJ c{v vai) 

i=l 

= c{v => (3aia2 ■ ■ ■ a^) 

where the derivation tree for G' now produces the from left to right rather than from right to left. 

The reader can easily check that the rest of the proof of theorem 4.6 of [16] can be rewritten this way, so 
that G and G' have derivations with all the same complex amplitudes. □ 

Greibach normal form is useful because the derivation trees it generates create a terminal symbol on the 
left with every production. Each such tree corresponds to a computation of a real-time PDA that accepts 
with an empty stack. Adding complex amplitudes gives us the quantum version of theorem 5.3 of [16]: 

Theorem 15. Any QCFL is recognized by a generalized QPDA. 

Proof. Convert the QCFL's grammar into Greibach normal form. Then construct a QPDA with the terminals 
T as its input symbols, with the variables V as its stack alphabet, and with one control state qk for each 
dimension of the grammar, 1 < k < n. 

Let the QPDA's transitions be as follows. For each production v ^ where a ^ T and 7 G V* , if the 
control state is Qk and the top stack symbol is v, let Ua pop v and push 7 on the stack with amplitude 
Cfe {v a"f) . Always leave the control state unchanged. 

Then as we read the input symbols a, the QPDA guesses a derivation tree and ends with an empty 
stack. The amplitude of a computation path with control state Qk is equal to the fc'th amplitude of the 
corresponding derivation. Summing over all paths is equivalent to summing over all derivations. If the 
QPDA's initial control state vector is ginit = (Ijl,- ■•,!)) the initial stack is /, and Qaccept = then 
projecting onto i?accept = Q ® {e} sums over all k and gives the norm f{w) = Y^j. \ck{w)\^ . 

This gives us a QPDA that pushes whole words on the stack. Using lemma 12, we can convert it into one 
that pushes or pops one symbol or leaves the stack unchanged, and we're done. □ 

Conversely, by assigning the correct amplitudes to the productions in theorem 5.4 of [16], we can make 
each derivation match a computation path of a QPDA: 

Theorem 16. Any quantum language recognized by a generalized QPDA is a QCFL. 

Proof. By lemma 11, we will assume that the QPDA's transition amplitudes do not depend on the second- 
topmost stack symbol. 

Our variables will be of the form [gi, t, (72], where qi,q2 G Q and t E E U {e}. The leftmost variable will 
tell us that the QPDA is in control state qi with top symbol t (or an empty stack if i = e) and will be in 
state q2 by the time t is popped. As in the previous theorem, the terminals will be the input symbols of the 
QPDA, and the fc'th amplitude Ck of the derivation will be the amplitude of all paths that end with a final 
state qk- Thus the dimensionality of the grammar is equal to that of Qaccept- 

To start us off, we guess the QPDA's final state qk, initial state gi, and initial stack /?, and what states 
12, ■ ■ ■ ,q\0\ we will go through as we pop the symbols of (3. For each allowed control state qk S Qaccept, 
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for each state-stack pair (gi,/3) with nonzero amphtude in Si^it, and for all possible chains of control states 
92) • • • ) 9|,3| € Q, allow the production 

I [9i,/3i,92] [q2,P2,q3]-- ■ [q\p\, P\i3\,Qk\ 

with amplitudes Cfc = (sinitK^i, /?)) and cj = for all j ^ k. (These will be our only productions for which 

Cfc depends on fc.) 

Then reading an input symbol a & A, pushing a symbol s on the stack, and entering state (73 is represented 
by a production of the form 

92] a [93,5,94] [94,^,92] (2) 

whose amplitudes are all equal to the amplitude {{qi,(T)\Ua\{q3,sa)) of this QPDA transition. This pro- 
duction is allowed for any 94, which is the state we guess that we will pass through after popping s at some 
later time. 

Similarly, reading an input symbol a, popping t off the stack, and entering state 92 is represented by 

[qi,t,q2]^a (3) 

whose amplitudes Ck arc all equal to the amplitude {{qi,t(T)\Ua\{q2, c)) of this transition. Changing the state 
to 93 while leaving the stack unchanged is represented by 

[qi,t,q2] ^ a[q3,t,q2] (4) 

with amplitudes Cfc = ((gi, cr)|[/o|(g3, cr)). 

Then, if we apply our productions always to the leftmost variable, we see that each derivation tree 
corresponds to a computation path of the QPDA with the same amplitude as the derivation. Summing 
over derivations sums over computation paths. Ck{w) = (sinit|C^o|(9fe, e)) is the amplitude of all paths that 
end with the QPDA in control state qk with an empty stack. Then f{w) = Y^k=i l^kiw)]"^ sums over all 
qk € Qaccept and the theorem is proved. □ 

This representation of the control state, in which every control state occurs in two variables, is neces- 
sary to enforce a consistent series of transitions, since symbols in a context-free derivation have no way of 
communicating with each other once they are created. 

An alternate approach would be to give our productions matrix-valued a,mp\itudes, so that their transitions 
can keep track of the state. Our current definition, in which the Cfe are simply multiplied componentwise, is 
equivalent to using diagonal matrices. Since matrices do not commute in general, we would have to choose 
an order in which to multiply the production amplitudes to define a derivation's amplitude. A leftmost 
depth-first search of a derivation in Greibach normal form would still correspond to a computation path of 
a QPDA. However, our proof of Greibach normal form breaks down because of the way lemma 4.4 of [16] 
changes the shape of the tree. If such grammars can be put in Greibach normal form, then theorem 15 works 
and they are equivalent to QPDAs. If they cannot, they may be more powerful. 

The productions in the above proof look nonunitary because they produce either too much probability, 
since (2) is allowed for any choice of 94, or too little, since (3) and (4) may not correspond to transitions 
that are allowed at all. Let us define 

Definition. A QCFL is unitary if it is recognized by a unitary QPDA. 

It is not clear what constraints a quantum grammar needs to meet to be unitary. Nor is it clear whether 
these constraints can be put in a simple form that is preserved by the kinds of transformations we use in 
lemma 14. Perhaps a grammar's productions affect unitarity in a similar way to the rule table of a quantum 
cellular automaton. An algorithm to tell whether a quantum CA is unitary is given in [11]. 

Finally, we note that theorems 15 and 16 have the following corollaries: 

Corollary. Any quantum context-free grammar is equivalent to one in which the production amplitudes Ck 
do not depend on k except for productions from the initial variable. Any generalized QPDA can be simulated 
by one whose transitions never change its control state, for which Qaccept = Q, and whose only initial stack 
consists of a single symbol. 

It is not clear whether the latter is true in the unitary case. 
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3.3 Closure properties of QCFLs 

Classical context-free languages are closed under intersection with a regular language. The quantum version 
of this follows easily: 

Lemma 17. /// is a (unitary) QCFL and g is a QRL, then fg is a (unitary) QCFL. 

Proof. Wc simply form the tensor product of the two automata. If / and g have finite-dimensional state 
spaces Q and R, construct a new QPDA with control states Q(^R, transition matrices = Ul (S> (recall 
that (g) preserves unitarity), and accepting subspace -ffaccept = Qaccept ® ^^accept (8 {e}- □ 

Classical CFLs are also closed under union, which as before becomes addition: 

Lemma 18. // / and g are QCFLs, then f + g is a QCFL. 

Proof. We define a direct sum of two grammars as follows. Suppose the grammars generating / and g have 
m and n dimensions, variables V and W, and initial variables / and J. Wc will denote their amplitudes by 
cj, and c^. Then create a new grammar with m + n dimensions, variables y U T4^ U {K}, and initial variable 
K, with the productions K ^ I and K ^ J allowed with amplitudes Ck — I- Other productions are allowed 
with Cfe = c{ for 1 < A; < m and Ck = (^k-m for m -|- 1 < A; < m -|- n. The reader can easily check that this 
grammar generates f + g. □ 

We would like to say that a weighted sum af + bg, where a -|- 6 = 1, of unitary QCFLs is unitary. This 
is true if the QPDAs accepting / and g have stack alphabets of the same size. Just take the direct sum of 
their control state spaces and let both sets of states interpret the stack as if it were their own. However, if 
one stack alphabet is bigger than the other, we have to figure out how to handle the dynamics in a unitary 
way when one of /'s states tries to read one of g's stack symbols. We leave this as a question for the reader. 

3.4 The generating functions of QCFLs 

If we define a generating function of a context-free language L that counts multiple derivations, Gl = 
'^yj^]^ri{w) w, where n{w) is the rmnibcr of derivations of w in L's grammar, then Gl is algebraic. That 
is, it is a solution to a finite set of polynomial equations in noncommuting variables [18]. If we don't count 
multiple derivations and define Gl — J^wgl'^ instead, then Gl is algebraic for unambiguous context-free 
languages since each word has a unique derivation [16]. 

For instance, the Dyck language is generated by the unambiguous grammar P = {I ^ albl, I ^ e}, 
where we have replaced left and right brackets with a and b respectively. Then its generating function obeys 
the quadratic equation in noncommuting variables 



G = aGbG + 1 



If we set a = b = z, this becomes 



g{z) = z^g^ + 1 



whose solution is 




1 - yi - 4z^ 

2? 



= l + z^ + 2z'^ + 5z^ + Uz^ + ■■■ 



whose z'^'^ coefiicient is the Catalan number ^ J / + 

The closest we can come to this in the quantum case is the following. 




Definition. The Hadamard square of a formal power series g is the Hadamard product g* g. 



Theorem 19. /// is a QCFL, then Gf is a restriction of the Hadamard square of an algebraic power series. 
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Proof. As in theorem 9, we start with generating functions weighted with complex amphtudes rather than 
probabihties. For each dimension k of the grammar write c for Ck and define 

9v= ^ c{v ^w)w 

This is the generating function of the terminal words w G T* that can be derived from a variable v Cz V, 
weighted by the A;'th amplitudes of each derivation. For a terminal a S T, we define ga = a since a can only 
produce itself. We also use the shorthand 

90 = 9Pi9P2 ■ ■ -901131 

since the words that can be derived from a word (3 are simply concatenations of those that can be derived 
from each of /3's symbols. 

Then the gy obey the following equations, with one term for each production: 

9v= ^ 913 

/3e(yuT)* 

each of which is a polynomial of order max^|c(w^/3)^o This system of equations has an algebraic solution 
91- 

If we call the gi based on the fc'th amplitude gk, then G/ is the sum of their Hadamard squares 

n n 
w w fe=l k=l 

We can write this as a single Hadamard square in the following way. For each dimension k of the grammar, 
introduce a new symbol Xk- Then if we define g = J2k=i ^kgk, we have 

n 

9*Q9 = J2^'' (SkQ9k) 
fe=i 

and G/ = 5* fif in the restriction Xk = I for all k. □ 

Unfortunately, unlike the class of rational series, the class of algebraic series is not closed under Hadamard 
product. This corresponds to the fact that the context-free languages are not closed under intersection. In fact, 
the set of accepting computations of a Turing machine is the intersection of two CFLs, so it is undecidable 
whether two algebraic series have a nonzero Hadamard product [16]. 

This also means that the Hadamard square of an algebraic series can be transcendental. Let A and B 
be two algebraic series such that A Q B is transcendental. Then if C = {A + B)/2 and D = [A — B)/2, 
we have AQ B = {C Q) C) — {D Q D) and at least one of C C and D Q D must be transcendental. As a 

concrete example, g{z) = J2T=o (^n^ ^" algebraic, but gQg can be shown to be transcendental using the 

asymptotic techniques in [13]. 

Ideally, this result could be used to show that certain inherently ambiguous context-free languages, whose 
generating functions aren't the Hadamard square of an algebraic function, are not QCFLs. Unfortunately, it 
is not obvious how to prove this, even in the case where all the f{w) are or 1. 

3.5 Regular grammars 

Although it is painfully obvious at this point, we include the following for completeness. 

Definition. A quantum grammar is regular if only productions of the form vi — > WV2 and — > w have 
nonzero amplitudes, where vi,V2 S V are variables and w gT* is a (possibly empty) word of terminals. 

Theorem 20. A quantum language is a generalized QRL if and only if it is generated by a regular quantum 
grammar. 



18 Cris Moore and James P. Crutchfield 



Proof. First we show that the language / generated by a regular quantum grammar is a generalized QRL. 
Using the techniques of lemma 14, we can convert any regular grammar into one where = 1, i.e. all 
productions are of the form vi — > av2 or vi — > a, where v-i,V2 S.V and a &T. 

If there are m variables, then for each dimension k of the grammar we can define a set of (m + 1)- 
dimensional transition matrices ui''^ : 



Then |cfc(i(;)| = \siniiUw -PaccoptI, where Si^it is the unit vector (sjnit); = 1 if = / and otherwise; and 
W-Paccept = Um+1, i-s. -Paccept projccts onto a vector's (to + l)'st component. Then each fk = \ck{w)\^ is a 
QRL and by lemma 1 so is their sum f{w) = Y^k=i /fe(^) — Y^k=i l^ki^)]"^- 

Conversely, let / be a generalized QRL. Its state space is spanned by a set of unit vectors that we identify 
with the variables V. The accepting subspace i?accept is spanned by a set of unit vectors hk as in theorem 
9, each of which corresponds to one dimension of the grammar. Then define the production amplitudes as 
follows: 



Since only the last of the amplitudes in theorem 20 depend on k, we can add the following corollary: 

Corollary. Any regular grammar is equivalent to one in which the Ck don't depend on k except for productions 
of the form w — > e. 

Just as the regular languages are a proper subclass of the context-free languages, we can show that the 
QRLs are a proper subclass of the QCFLs, in both the unitary and non-unitary cases: 

Theorem 21. The QRLs are a proper subclass of the unitary QCFLs, and the generalized QRLs are a proper 
subclass of the QCFLs. 

Proof. Containment is given in both cases by using the control state of a (unitary) QPDA to simulate a 
(unitary) QFA while leaving its stack alone. It is proper because the language L= of words in {a, b} with an 
equal number of a's and 6's is a unitary QCFL (or rather, its characteristic function is) but not a generalized 
QRL, as we will now show. 

Consider a QPDA with two control states A and B and one stack symbol x. The stack will indicate how 
many excess a's or 6's we have, with the control state indicating which dominates. Then starting with an 
empty stack Sjnit = (^jc), we can recognize L= with the transition matrices 




Ck{I ^v) = (Sinit|v) 
Ck{Vi aVj) = {Ua)ij 

Ck{vj ^ e) = {vj\hk) 



Then YJk=i = Sfc=i \{sinit\Uw\hk)\^ = |(smit|C^«;| -Paccept) P and the theorem is proved. 



□ 



(A, e) {A,x) {B,x) {A,xx) {B,xx) {A,xxx) {B,xxx) ■■■ 



Ua = 



(Ae) 
{A,x) 
iB,x) 
{A, xx) 
(B, xx) 




{A, xxx) 
{B, xxx) 



1 



(with all other entries zero and {B,e) left unchanged and unused) and Ui, = U\ = U^^^ . Since both Ua and 
Ub are unitary, this is a QPDA and i= is a unitary QCFL. 
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On the other hand, i='s generating function 



n=0 



is algebraic but not rational, so i= is not a generalized QRL by theorem 9. □ 

Since regular grammars are also context-free, theorem 20 is another proof that the generalized QRLs are 
a subclass of the QCFLs. 

3.6 QCFLs and CFLs 

Finally, wo will compare our quantum classes to their classical counterparts. Lemma 7 states that any 
regular language is a generalized QRL. Similarly, we have (again conflating a language with its characteristic 
function): 

Lemma 22. Any unambiguous context-free language is a QCFL. More specifically, for any unambiguous 
CFL L there is a quantum grammar of dimensionality 1 such that c{w) = xl{w). 

Proof. Simply give allowed and disallowed productions amplitudes 1 and 0, respectively. Since L is unam- 
biguous, each allowed word has exactly one derivation, so c{w) = Xl{w). Since and 1 are their own squares, 
we also have f{w) = |c(w)|^ = xl{w). □ 

Using the quantum effect of destructive interference, we can get the following nonclassical result, showing 
that quantum context-free grammars and QPDAs are strictly more powerful than classical ones: 

Theorem 23. // Li and are unambiguous context-free languages, their symmetric difference Li A L2 = 
(ii U L2) - {Li n L2) is a QCFL. 

Proof. If Li and L2 are generated by grammars with initial variables Ii and I2, then create a new initial 
variable / and allow the productions I ^ Ii and I ^ I2 with amplitudes 1 and —1, respectively. Then 
/ = |c(i)(w;) + c(2)(u;)|2 = 1 if u; is in Li or L2, but not both. □ 

CoroUciry. There are QCFLs that are not context-free. 

Proof. Let Li = {a^VcP} and L2 = {a^b^b^}, both of which are unambiguous context-free. Then 

Li AL2 = {a'Vc'' \ i = j or j = k, but not both} 

is a QCFL, but it can be shown to be noncontext-free using the pumping lemma for context-free languages 
[16]. □ 

We can use interference in another amusing way: 

Theorem 24. IfLi, L2, and L^ are unambiguous context-free languages, then (ii 0X2*^^3) — (Li nL2nL3) 
is a QCFL. 

Proof. Create a new initial variable / and allow the productions I ^ Ii, I ^ I2, and I ^ I3 with amplitudes 
1, e^'^*/"^, and e"'^*/'', respectively. Since these are 120° apart, / = \c^^\w) -|-c^^^(w) -'1- c^^\w)\'^ if w is in one 
or two, but not all three, of the three languages. □ 

Unfortunately, there are no sets of four or more vectors with norm 1 such that the sum of any subset of 
them has norm 1, so this is as far as this argument goes.^ 

The next logical questions are whether all languages whose characteristic functions are QCFLs are 
context-sensitive [16] and whether theorem 19 can be used to show that some inherently ambiguous CFLs, 
with transcendental generating functions, are not QCFLs. 

^ We are indebted to Jan-Christoph Puchta, David Joyner, Benjamin Lotto, and Dan Asimov for providing proofs 
of this fact. 
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4 Conclusion and directions for further work 

We have defined quantum versions of finite-state automata, pusli-down automata, and context-free grammars, 
and shown that many classical results carry over into the quantum case. We leave the reader with a set of 
open questions, some of which have already been mentioned above: 

1. What happens when we remove the real-time restriction, allowing the machine to choose when to read 
an input symbol? This adds no power to classical DFAs and PDAs [16]. Does it in the quantum case? 

2. What about two-way automata, that can choose to move left or right on the input? This adds nothing 
to classical DFAs [16] or real- valued stochastic finite-state automata [17]. Does it make QFAs more 
powerful? 

3. Is there a natural quantum analog of rational transductions [3], under which QRLs and QCFLs are closed 
without losing unitarity? 

4. Are QRLs incomparable with stochastic and pseudo-stochastic functions? 

5. Is each QRL recognized by a unique QFA (up to isomorphism) with the minimal number of dimensions? 
It might be possible to determine the eigenvalues of Uw for all w by Fourier analysis of f{u'w^v). We 
could then reconstruct the [/„, since any set of matrices is determined by their eigenvalues and those of 
their products [15]. 

6. Can grammars with noncommuting matrix-valued amplitudes be defined in a consistent way and put in 
Greibach normal form? 

7. Is there a simple way of determining whether a quantum context-free grammar generates a unitary 
QCFL? 

8. Can a QPDA be simulated by one that never changes its control state, and for which Qaccept = Q, 
without losing unitarity? 

9. Is a weighted sum of unitary QCFLs a unitary QCFL, even when their QPDAs have stack alphabets of 
different sizes? 

10. Is there a quantum analog to the Dyck languages Dfc and to Chomsky's theorem that every CFL is a 
homomorphic image of the intersection of Dk with a regular language? 

11. Are the QCFLs contained in the context-sensitive languages? 

12. Are there CFLs that are not QCFLs? 

13. Can we define quantum versions of other real-time recognizer classes, such as queue automata [6], counter 
automata [16], and real-time Turing machines [2, 10]? 

14. Are languages recognized by real-time QTMs the product of two QCFLs, analogous to intersection in 

the classical case [16]? 

15. We can easily define quantum context-sensitive grammars. Do they correspond to a quantum version of 
linear-bounded Turing machines [16]? 

We hope that quantum grammars and automata will be fruitful areas of research and that they will be useful 
to people studying quantum computation. 
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